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1. Introduction 



Human eye can see and read what is written or displayed either in natural handwriting or in 
printed format. The same work in case the machine does is called handwriting recognition. 
Handwriting recognition can be broken down into two categories: off-line and on-line. 

o 

Off-line character recognition - Off-line character recognition takes a raster image from a 
^H scanner (scanned images of the paper documents), digital camera or other digital input 

sources. The image is binarised based on for instance, color pattern (color or gray scale) so 
. £_J that the image pixels are either 1 or 0. 

On-line character recognition - In on-line, the current information is presented to the system 
and recognition (of character or word) is carried out at the same time. Basically, it accepts a 
string of (x,y) coordinate pairs from an electronic pen touching a pressure sensitive digital 
tablet. 

In this chapter, we keep focusing on on-line writer independent cursive character recognition 
engine. In what follows, we explain the importance of on-line handwriting recognition over 
off-line, the necessity of writer independent system and the importance as well as scope 
of cursive scripts like Devanagari. Devanagari is considered as one of the known cursive 
scripts Jayadevan et al. (2011); Pal & Chaudhuri (2004). However, we aim to include other 
scripts related to the current study. 
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1.1 Why On-line? 

With the advent of handwriting recognition technology since a few decades Arica & 
Yarman-Vural (2001); Plamondon & Srihari (2000), applications are challenging. For example, 
OCR is becoming an integral part of document scanners, and is used in many applications 
such as postal processing, script recognition, banking, security (signature verification, for 
instance) and language identification. In handwriting recognition, feature selection has been 
an important issue 0ivind Due Trier et al. (1996). Both structural and statistical features as 
well as their combination have been widely used Foggia et al. (1999); Heutte et al. (1998). 
These features tend to vary since characters' shapes vary widely. As a consequence, local 
structural properties like intersection of lines, number of holes, concave arcs, end points and 
junctions change time to time. These are mainly due to 

• deformations can be from any range of shape variations including geometric transformation 
such as translation, rotation, scaling and even stretching; and 

• defects yield imperfections due to printing, optics, scanning, binarisation as well as poor 
segmentation. 

In the state-of-the-art of handwritten character recognition, several different studies have 
shown that off-line handwriting recognition offers less classification rate compared to 
on-line Plamondon & Srihari (2000); Tappert et al. (1990). Furthermore, on-line data offers 
significant reduction in memory and therefore space complexity. Another advantage is that 
the digital pen or a digital form on a tablet device immediately transforms your handwriting 
into a digital representation that can be reused later without having any risk of degradation 
usually associated with ancient handwriting. Based on all these reasons, one can cite a 
few examples Boccignone et al. (1993); Doermann & Rosenfeld (1995); Qiao et al. (2006); 
Viard-Gaudin et al. (2005) where they mainly focus on temporal information as well as writing 
order recovery from static handwriting image. On-line handwriting recognition systems 
provide interesting results. 

On-line character recognition involves the automatic conversion of stroke as it is written 
on a special digitizer or PDA, where a sensor picks up the pen-tip movements as well as 
pen-up /pen-down switching. Such data is known as digital ink and can be regarded as a 
dynamic representation of handwriting. The obtained signal is converted into letter codes 
which are usable within computer and character-processing applications. 

The elements of an on-line handwriting recognition interface typically include: 

1. a pen or stylus for the user to write with, and a touch sensitive surface, which may be 
integrated with, or adjacent to, an output display. 

2. a software application i.e., a recogniser which interprets the movements of the stylus across 
the writing surface, translating the resulting strokes into digital character. 

Globally, it resembles one of the applications of pen computing i.e., computer user-interface 
using a pen (or stylus) and tablet, rather than devices such as a keyboard, joysticks or a mouse. 
Pen computing can be extended to the usage of mobile devices such as wireless tablet personal 
computers, PDAs and GPS receivers. 
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Fig. 1: On-line stroke sequences in the form of 2D (x,y) coordinates. In this illustration, initial pen-tip 
position is coloured with red and pen-up (final point) is coloured with blue. 

Historically, pen computing (defined as a computer system employing a user-interface using a 
pointing device plus handwriting recognition as the primary means for interactive user input) 
predates the use of a mouse and graphical display by at least two decades, starting with the 
Stylator Dimond (1957) and RAND tablet Groner (1966) systems of the 1950s and early 1960s. 

1.2 Why Writer Independent? 

As mentioned before, on-line handwriting recognition systems provide interesting results 
almost over all types scripts. The recognition systems vary widely which can be due to nature 
of the scripts employed along with the associated particular difficulties including the intended 
applications. The performance of the application-based (commercial) recogniser is used to 
determine by its speed in addition to accuracy. 

Among many, more specifically, template based approaches have a long standing 
record Bahlmann & Burkhardt (2004); Connell & Jain (1999); Hu et al. (1996); Santosh & Nattee 
(2006a); Schenkel et al. (1995). In many of the cases, writer independent recogniser has been 
made since every new user does not require training - which is widely acceptable. In such a 
context, the expected recognition system should automatically update or adapt the new users 
once they provide input or previously trained recogniser should be able to discriminate new 



1.3 Why Devanagari? 

In a few points, interesting scope will be summarised. 

1. Pencil and paper can be preferable for anyone during a first draft preparation instead of 
using keyboard and other computer input interfaces, especially when writing in languages 
and scripts for which keyboards are cumbersome. Devanagari keyboards for instance, are 
quite difficult to use. Devanagari characters follow a complex structure and may count up 
to more than 500 symbols Jayadevan et al. (2011); Pal & Chaudhuri (2004). 

2. Devanagari is a script used to write several Indian languages, including Nepali, Sanskrit, 
Hindi, Marathi, Pali, Kashmiri, Sindhi, and sometimes Punjabi. According to the 2001 
Indian census, 258 million people in India used Devanagari. 
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Fig. 2: A few samples of several different similar classes from Devanagari script. 

3. Writing one's own style brings unevenness in writing units, which is the most difficult 
part to recognise. Variation in basic writing units such as number of strokes, their order, 
shapes and sizes, tilting angles and similarities among classes of characters are considered 
as the important issues. In contrast to Roman script, it happens more in cursive scripts like 
Devanagari. 

Devanagari is written from left to right with a horizontal line on the top which is the 
shirorekha. Every character requires one shirorekha from which text(s) is(are) suspended. 
The way of writing Devanagari has its own particularities. In what follows, in particular, 
we shortly explain a few major points associated difficulties. 

• Many of the characters are similar to each other in structure. Visually very similar 
symbols - even from the same writer - may represent different characters. While it 
might seem quite obvious in the following examples to distinguish the first from the 
second, it can easily be seen that confusion is likely to occur for their handwritten symbol 
counterparts (3T, T ?T), (*T, T), (£", ST), etc.). Fig. 2 shows a few examples of it. 

• The number of strokes, their order, shapes and sizes, directions, skew angle etc. are 
writing units that are important for symbol recognition and classification. However, 
these writing units most often vary from one user to another and there is even no 
guarantee that a same user always writes in a same way. Proposed methods should 
take this into account. 

Based on those major aforementioned reasons, there exists clear motivation to pursue research 
on Devanagari handwritten character recognition. 

1.4 Structure of the Chapter 



The remaining of the paper is organised as follows. In Section 2, we start with detailing 
the basic concept of character recognition framework in addition to the major highlights on 
important issues: feature selection, matching and recognition. Section 3 gives a complete 
outline of how we can efficiently handle optimal recognition performance over cursive scripts 
like Devangari. In this section, we first provide the complete and then validate the whole 
process step by step with genuine reasoning and a series of experimental tests over our own 
dataset but, publicly available. We conclude the chapter in Section 4. 
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Fig. 3: Learning strokes from the handwritten symbols. In this illustration, we present a basic concept to 
form template via clustering of features of the strokes immediately after they are pre-processed. 

2. Character Recognition Framework 

Basically, we can categorise character recognition system into two modules: learning and 
testing. In learning or training module, following Fig. 3, handwritten strokes are learnt or 
stored. Testing module follows the former one. The performance of the recognition system is 
depends on how well handwritten strokes are learnt. It eventually refers to the techniques we 
employ. 

Basically, learning module employs stroke pre-processing, feature selection and clustering to 
form template to be stored. Pre-processing and feature selection techniques can be varied 
from one application to another. For example, noisy stroke elimination or deletion in Roman 
cannot be directly extended to the cursive scripts like Urdu and Devanagari. In other words, 
these techniques are found to be application dependent due to their different writing styles. 
However, they are basically adapted to each other and mostly ad-hoc techniques are built so 
that optimal recognition performance is possible. In the framework of stroke-based feature 
extraction and recognition, one can refer to Chiu & Tseng (1999); Zhou et al. (2007), for 
example. It is important to notice that feature selection usually drives the way we match 
them. As an example, fixed size feature vectors can be straightforwardly matched while 
for non-linear feature vector sequences, dynamic programming (elastic matching) has been 
basically used Keogh & Pazzani (1999); Kruskall & Liberman (1983); Myers & Rabiner. (1981); 
Sakoe (1978). The concept was first introduced in the 60's Bellman & Kalaba (1959). Once 
we have an idea to find the similarity between the strokes' features, we follow clustering 
technique based on their similarity values. The clustering technique will generate templates 
as the representative of the similar strokes provided. These stored templates will be used for 
testing in the testing module. Fig. 4 provides a comprehensive idea of it (testing module). 
More specifically, in this module, every test stroke will be matched with the templates (learnt 
in training module) so that we can find the most similar one. This procedure will be repeated 
for all available test strokes. At the end, aggregating all matching scores provides an idea of 
the test character closer to which one in the template. 

2.1 Preprocessing 

Strokes directly collected from users are often incomplete and noisy. Different systems use 
a variety of different pre-processing techniques before feature extraction Alginahi (2010); 
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Fig. 4: An illustration of testing module. As in learning module, test characters are pre-processed and we 
present a basic concept to form template via clustering of features of the strokes immediately after they 
are pre-processed. 

Blumenstein et al. (2003); Verma et al. (2004). The techniques used in one system may 
not exactly fit into the other because of different writing styles and nature of the scripts. 
Very common issues are repeated coordinates deletion Bahlmann & Burkhardt (2004), noise 
elimination and normalisation Chun et al. (2005); Guerfali & Plamondon (1993). 

Besides pre-processing, in this chapter, we mainly focus on feature selection and matching 
techniques. 

2.2 Feature Selection 

If you have complete address of your friend then you can easily find him/her without an 
additional help from other people on the way. The similar case is happened in character 
recognition. Here, an address refers to a feature selection. Therefore, the complete or sufficient 
feature selection from the provided input is the crucial point. In other words, appropriate 
feature selection can greatly decrease the workload and simplify the subsequent design 
process of the classifier. 

In what follows, we discuss a few but major issues associated with feature selection. 



Pen-flow i.e., speed while writing determines how well the coordinates along the pen 
trajectory are captured. Speed writing and writing with shivering hands, do not provide 
complete shape information of the strokes. 

Ratios of the relative height, width and size of letters are not always consistent - which is 
obvious in natural handwriting. 

Pen-down and pen-up events provide stroke segmentation. But, we do not know which 
and where the strokes are rewritten or overwritten. 

Slant writing style or writing with some angles to the left or right makes feature selection 
difficult. For example, in those cases, zoning information using orthogonal projection does 
not carry consistent information. This means that the zoning features will vary widely as 
soon as we have different writing styles. 



Appeared in 'Advances in Character Recognition', Editor - Xiaoqing Ding, ISBN 978-953-51-0823-8 Authors' copy 



/ 



/ 



\ 
V 

end (pen-up) J* 

ar* / 

initial (pen-down) + 

Fig. 5: An illustration of feature selection: pen-tip position and tangent at every pen-tip position along 
the pen trajectory. 

We repeat, features should contain sufficient information to distinguish between classes, be 
insensitive to irrelevant variability of the input, allow efficient computation of discriminant 
functions and be able to limit the amount of training data required Lippmann (1989). 
However, they vary from one script to another Blumenstein et al. (2003); Namboodiri & Jain 
(2004); Okumura et al. (2005); Verma et al. (2004). 

Feature selection is always application dependent i.e., it relies on what type of scripts (their 
characteristics and difficulties) used. In our case, we use a feature vector sequence of any 
stroke is expressed as in Okumura et al. (2005); Santosh & Nattee (2006a); Santosh, Nattee & 
Lamiroy (2012): 

F = [(Pi/ap 1/ p 2 ),(p2/ap 2/ p 3 ) / ... / (pz_i / ap z _ 1/ p / )] (1) 

where, a p/ _ 1/Pl = arctan I jf Ij" 1 ) . Fig. 5 shows a complete illustration. 

Our feature includes a sequence of both pen-tip position and tangent angles sampled from 
the trajectory of the pen-tip, preserving the directional property of the trajectory path. It 
is important to remind that stroke direction (either left - right or right - left) leads to very 
different features although they are geometrically similar. To efficiently handle it, we need 
both kinds of strokes or samples for training and testing. This does not mean that same writer 
must be used. 

The idea is somehow similar to the directional arrows that are composed of eight types, coded 

\ t /* 

from — 7. This can be expressed as, <— o — h 

However, these directional arrows provide only the directional feature of the strokes or line 
segments. Therefore, more information can be integrated if the relative length of the standard 
strokes is taken into account Cha et al. (1999). 

2.3 Feature Matching 

Besides, discussing on classifiers, we explain how features can be matched to obtain similarity 
or dissimilarity values between them. 
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Matching techniques are often induced by how features are taken or strokes are represented. 
For instance, normalising the feature vector sequence into a fixed size vector provides an 
immediate matching. On the other hand, features having different lengths or non-linear 
features need dynamic programming for approximate matching, for instance. Considering 
the latter situation, we explain how dynamic programming is employed. 

Dynamic time warping (DTW) allows us to find the dissimilarity between two non-linear 
sequences potentially having different lengths Keogh & Pazzani (1999); Kruskall & Liberman 
(1983); Myers & Rabiner. (1981); Sakoe (1978). It is an algorithm particularly suited to 
matching sequences with missing information, provided there are long enough segments for 
matching to occur. 

Let us consider two feature sequences 

X = {*k}k=i,...,K and 

Y = {yih=i L 

of size K and L, respectively. The aim of the algorithm is to provide the optimal alignment 
between both sequences. At first, a matrix M of size K x L is constructed. Then for each 
element in matrix M, local distance metric 5(k,l) between the events e^ and e\ is computed 
i.e., 5{k, I) = (ek — ej) 2 . Let D(k,l) be the global distance up to (k,l), 



D(k,l) 



D{k- 1,1-1) 
D{k-l,l), 

D(M-i) 



+ S(k,l) 



with an initial condition D(l, 1) = 5(1, 1) such that it allows warping path going diagonally 
from starting node (1, 1) to end (K, L). The main aim is to find the path for which the least cost 
is associated. The warping path therefore provides the difference cost between the compared 
signatures. Formally, the warping path is, 

W= {™t} t =l...T> 

where max(k,l) < T < k + / - 1 and t th element of W is w(k,l) t G [1 : K] x [1 : L] for 
t G [1 : T] . The optimised warping path W satisfies the following three conditions. 

cl. boundary condition: 

W\ = (1, 1) and Wj = (K, L). 

c2. monotonicity condition: 

k\ < k 2 < ■ ■ ■ < k K and l x < l 2 < ■ ■ ■ < l L . 
c3. continuity condition: 

w t+1 -w t e {(1,1) (0,1), (1/0)} forte [1 :T-1]. 

cl conveys that the path starts from (1,1) to (K, L), aligning all elements to each other. c2 
forces the path advances one step at a time. c3 restricts allowable steps in the warping path to 
adjacent cells, never be back. Note that c3 implies c2. 
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Fig. 6: Classical DTW algorithm - an alignment illustration between two non-linear sequences X and Y. 
In this illustration, diagonal DTW-matrix is shown including how back-tracking has been employed. 

We then define the global distance between X and Y as, 

A(X,Y) = ^. 

The last element of the K x L matrix gives the DTW-distance between X and Y, which is 
normalised by T i.e., the number of discrete warping steps along the diagonal DTW-matrix. 
The overall process is illustrated in Fig. 6. 

Until now, we provide a global concept of using DTW distance for non-linear sequences 
alignment. In order to provide faster matching, we have used local constraint on time warping 
proposed in Keogh (2002). We have w(k,l)t such that I — r < k < I + r where r is a term 
defining a reach i.e., allowed range of warping for a given event in a sequence. With r, upper 
and lower bounding measures can be expressed as, 

Upper bound 14 = max(x^_ r : x^ +r ) 
Lower bound L k = min(x^_ r : x^ +r ). 

Therefore, for all i, an obvious property of U and L is U^ > x^ > L^. With this, we can define 
a lower bounding measure for DTW: 



LB_Keogh(X,Y) 



\ 



k ((yk~U k ) 2 ify fc >L4 

E { (yk-h) 2 ifyk<L k 



fc=l 







otherwise. 



Since this provides a quick introduction of local constraint for lower bounding measure, we 
refer to Keogh (2002) for more clarification. 
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2.4 Recognition 

From a purely combinatorial point of view, measuring the similarity or dissimilarity between 
two symbols 

Si = (si) andS 2 = (s ; 9 ) 

I L )i=l...n I z )j=l...m 

composed, respectively, of n and m strokes, requires a one by one matching score computation 

of all strokes s^ with all s^. This means that we align individual test strokes of an unknown 
symbols with the learnt strokes. As soon as we determine the test strokes associated with the 
known class, the complete symbol can be compared by the fusion of matching information 
from all test strokes. Such a concept is fundamental under the purview of stroke-based 
character recognition. 

Overall, the concept may not always be sufficient, and these approaches generally need a 
final, global coherence check to avoid matching of strokes that shows visual similarity but 
do not respect overall geometric coherence within the complete handwritten character. In 
other words, matching strategy that happens between test stroke and templates of course, 
should be intelligent rather than straightforward one-to-many matching concepts. However, 
it in fact, depends on how template management has been made. In this chapter, this is one 
of the primary concerns. We highlight the use of relative positioning of the strokes within 
the handwritten symbol and its direct impact to the performance Santosh, Nattee & Lamiroy 
(2012). 

3. Recognition Engine 

To make the chapter coherence as well as consistent (to Devanagari character recognition), it 
refers to the recognition engine which is entirely based on previous studies or works Santosh 
& Nattee (2006a;b; 2007); Santosh et al. (2010); Santosh, Nattee & Lamiroy (2012). Especially 
because of the structure of Devanagari, it is necessary to pay attention to the appropriate 
structuring of the strokes to ease and speed up comparison between the symbols, rather 
than just relying on global recognition techniques that would be based on a collection of 
strokes Santosh & Nattee (2006a). Therefore, Santosh et al. (2010); Santosh, Nattee & Lamiroy 
(2012) develop a method for analysing handwritten characters based on both the number of 
strokes and the their spatial information. It consists in four main phases. 

step 1. Organise the symbols representing the same character into different groups based on 
the number of strokes. 

step 2. Find the spatial relation between strokes. 

step 3. Agglomerate similar strokes from a specific location in a group. 

step 4. Stroke-wise matching for recognition. 

For more clear understanding, we explain the aforementioned steps as follows. For a specific 
class of character, it is interesting to notice that writing symbols with the equal number of 
strokes, generally produce visually similar structure and is easier to compare. 

In every group within a particular class of character, a representative symbol is synthetically 
generated from pairwise similar strokes merging, which are positioned identically with 
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respect to the shirorekha. It uses DTW algorithm. The learnt strokes are then stored accordingly. 
It is mainly focused on stroke clustering and management of the learnt strokes. 

We align individual test strokes of an unknown symbols with the learnt strokes having 
both same number of strokes and spatial properties. Overall, symbols can be compared by 
the fusion of matching information from all test strokes. This eventually build a complete 
recognition process. 

3.1 Stroke Spatial Description and its Need 

The importance of the location of the strokes is best observed by taking a few pairs of 
characters that often lead to confusion: 



(^r^^r^^sr-) 



etc. 



The first character in every pair has visually two distinguishing features: its particular location 
of the shirorekha (more to the right) and a small curve in the text. There is no doubt that 
one of the two features is sufficient to automatically distinguish both characters. However, 
small curves are usually not robust feature in natural handwriting, finding the location of the 
shirorekha only can avoid possible confusion. Our stroke based spatial relation technique is 
explained further in the following. 

To handle relative positioning of strokes, we use six spatial predicates i.e., 2x3 relational 
regions: 



K 



toy-left (T-L) top (T) top-right (T-R) 

bottom-left (B-L) bottom (B) bottom-right (B-L) 



For easier understanding, iconic representation of the aforementioned relational matrix 1Z can 
be expressed as, 

o o 
o o 

where black-dot represents the presence i.e., stroke is found to be in the provided bottom-right 
region. 

To confirm the location of the stroke, we use the projection theory: minimum boundary 
rectangle (MBR) Papadias & Sellis (1994) model combined with the stroke's centroid. 

Based on Egenhofer & Herring (1991), we start with checking fundamental topological 
relations such as disconnected (DC), externally connected (EC) and overlap /intersect (O/I) by 
considering two strokes s J and s J : 

8i = { p iLi.. J ands|, = WL=i.. J ' 

as follows, 

s i n si' = | : if (P* n p£ ^ ) =* EC - 0/I 

otherwise =4> DC. 
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Fig. 7: Pairwise spatial relation for a two-stroke 37. 



We then use the border condition from the geometry of the MBR. It is straightforward for 
disconnected strokes while, is not for externally connected and overlap /intersect configurations. In 
the latter case, we check the level of the centroid with respect to the boundary of the MBR. For 
example, if a boundary of the shirorekha is above the centroid level of the text stroke, then it is 
confirmed that the shirorekha is on the top. This procedure is applied to all of the six previously 
mentioned spatial predicates. Note that use of angle-based model like bi-centre Miyajima & 
Ralescu (1994) and angle histogram Wang & Keller (1999) are not the appropriate choice due 
to the cursive nature of writing. 

On the whole, assuming that the shirorekha is on the top, the locations of the text strokes are 
estimated. This eventually allows to cross-validate the location of the shirorekha along with 
its size, once texts' locations are determined. Fig. 7 shows a real example demonstrating 
relative positioning between the strokes for a two-stroke symbol 3T. Besides, symbols with 
two shirorekha^ are also possible to treat. In such a situation, the first shirorekha according to 
the order of strokes is taken as reference. 
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(b) Three-stroke 3T 





Fig. 8: Relative positions of strokes for a class 3T in two different groups i.e., two-stroke and three-stroke 
symbols. 

3.2 Spatial Similarity based Clustering 

Basically, clustering is a technique for collecting items which are similar in some way. Items of 
one group are dissimilar with other items belonging to other groups. Consequently, it makes 
the recognition system compact. To handle this, we present spatial similarity based stroke 
clustering. 

As mentioned in previous work Santosh et al. (2010); Santosh, Nattee & Lamiroy (2012), the 
clustering scheme is a two-step process. 

• The first step is to organise symbols representing a same character into different groups, 
based on the number of strokes used to complete the symbol. Fig. 8 shows an example of 
it for a class of character 3T. 

• In the second step, strokes from the specific location are agglomerated hierarchically within 
the particular group. Once relative position for every stroke is determined as shown in 
Fig. 8, single-linkage agglomerative hierarchical clustering is used (cf. Fig. 10). This means 
that only strokes which are at a specific location are taken for clustering. As an example, 
we illustrate it in Fig. 9. This applies to all groups within a class. 

In agglomerative hierarchical clustering (cf. Fig. 10), we merge two similar strokes and find 
a new cluster. The distance computation between two strokes follows Section 2.3. The new 
cluster is computed by averaging both strokes via the use of the discrete warping path along 
the diagonal DTW-matrix. This process is repeated until it reaches the cluster threshold. The 
threshold value yields the number of cluster representatives i.e., learnt templates. 



Stroke-Based Cursive Character Recognition 



o o o 
• o o 


' 


o o o 
o • o 


' 


o o o 
o • o 


and 


• o o 
o o o 


' 


o • o 
o o o 


' 


o o • 
o o o 



text clustering 



shirorekha clustering 



Fig. 9: Clustering technique for each class. Stroke clustering is based on the relative positioning. As a 
consequence, we have three clustering blocks for text strokes and remaining three for shirorekha. 
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Fig. 10: Hierarchical stroke clustering concept. At every step, features are merged according to their 
similarity up to the provided threshold level. 

3.3 Stroke Number and Order Free Recognition 

In natural handwriting, number of strokes as well as their order vary widely. This happens 
from one writing to another, even from the same user - which of course exits from different 
users. Fig. 11 shows the large variation of stroke numbers as well as the orders. 

Once we have organised the symbols (from the particular class) into groups based on the 
number of strokes used, our stroke clustering has been made according to the relative 
positioning. As a consequence, while doing recognition, one can write symbol with any 
numbers and orders because stroke matching is based on relative positioning of the strokes in 
which group while it does not need to care about the strokes order. 

3.4 Dataset 



In this work, as before, publicly available dataset has been employed (cf. Table 1) where a 
Graphite tablet (WCACOM Co. Ltd.), model ET0405A-U, was used to capture the pen-tip 
position in the form of 2D coordinates at the sampling rate of 20 Hz. The data set is composed 
of 1800 symbols representing 36 characters, coming from 25 native speakers. Each writer 
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(a) two-stroke 3T (b) two-stroke ^T (c) three-stroke 3T 




dp 




(d) three-stroke 3T (e) four-stroke 3FT (f) three-stroke ^T 

Fig. 11: Different number of strokes and order for a class 3T. In this illustration, red-dot refers to the 
initial pen-tip position so that it makes easy to realise how many number of strokes to make a complete 
symbol. In addition, stroke ordering is different from one to another. 

Table 1: Dataset formation and its availability 



Item 


Description 


Classes of character 


36 


Users 


25 


Dataset size 


1800 


Visibility 


IAPRtc-11 

http : //www. iapr-tcll .org 



was given the opportunity to write each character twice. No other directions, constraints, or 
instructions were given to the users. 

3.5 Recognition Performance Evaluation 

While experimenting, every test sample is matched with training candidates and the closest 
one is reported. The closest candidate corresponds to the labelled class, which we call 
'character recognition'. Formally, recognition rate can be defined as the number of correctly 
recognised candidates to the total number of test candidates. 

To evaluate the recognition performance, two different protocols can be employed: 

1. dichotomous classification and 

2. IK-fold cross-validation (CV). 



In case of dichotomous classification, 15 writers are used for training and the remaining 10 are 
for testing. On the other hand, IK-fold CV has been implemented. Since we have 25 users for 
data collection, we employ IK = 5 in order to make recognition engine writer independent. 

In IK-fold CV, the original sample for every class is randomly partitioned into IK sub-samples. 
Of the IK sub-samples, a single sub-sample is used for validation, and the remaining IK — 
1 sub-samples are used for training. This process is then repeated for IK folds, with each 
of the IK sub-samples used exactly once. Finally, a single value results from averaging all. 
The aim of the use of such a series of rigorous tests is to avoid the biasing of the samples 
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Table 2: Error rates (in %) and running time (in sec. per character). The methods can be differentiated by 
the additional use of L_B Keogh tool Keogh (2002) and the evaluation protocol employed. 



Method 


# of # of Avg. 
Mis-recognition Rejection Error % 


Time 
sec. 


Ml. 


33 08 05.0 


04 


M2. 


24 08 03.5 


02 



Index: 

Ml. Santosh, Nattee & Lamiroy (2012). 

M2. Santosh, Nattee & Lamiroy (2012) + Keogh (2002) and 5-fold CV. 

that can be possible in conventional dichotomous classification. In contrast to the previous 
studies Santosh, Nattee & Lamiroy (2012), this will be an interesting evaluation protocol. 

3.6 Results and Discussions 

Following evaluation protocols we have mentioned before, Table 2 provides average 
recognition error rates. In the tests, we have found that the recognition performance has been 
advanced by approximately more than 2%. 

Based on results (cf. Table 2), we investigate the recognition performance based on the 
observed errors. We categorise the origin of the errors that are occurred in our experiments. 
As said in Section 1.3, these are mainly due to 

1. structure similarity, 

2. reduced and /or very long ascender and /or descender stroke, and 

3. others such as re-writing strokes and mis-writing. 

Compared to previous work Santosh, Nattee & Lamiroy (2012), number of rejection does not 
change while confusions due to structure similarity has been reduced. This is mainly because 
of the 5-fold CV evaluation protocol. Besides, running time has been reduced by more than a 
factor of two i.e., 2 seconds per character, thanks to LB_Keogh tool Keogh (2002). 

4. Conclusions 

In this chapter, an established as well as validated approach (based on previous 
studies Santosh & Nattee (2006a;b; 2007); Santosh et al. (2010); Santosh, Nattee & Lamiroy 
(2012)) has been presented for on-line natural handwritten Devanagari character recognition. 
It uses the number of strokes used to complete a symbol and their spatial relations 1 . Besides, 
we have provided the dataset publicly available for research purpose. Considering such a 
dataset, the success rate is approximately 97% in less than 2 seconds per character on average. 
In this chapter, note that the new evaluation protocol reduces the errors (mainly due to 
multi-class similarity) and the optimised DTW reduces the delay in processing - which has 
been new attestation in comparison to the previous studies. 



1 A comprehensive work based on relative positioning of the handwritten strokes, is presented in Santosh, Nattee & 
Lamiroy (2012). Once again, to avoid contradictions, this chapter aims to provide coherence as well as consistent 
studies on Devanagari character recognition. 



Appeared in 'Advances in Character Recognition', Editor - Xiaoqing Ding, ISBN 978-953-51-0823-8 Authors' copy 17 



The proposed approach is able to handle handwritten symbols of any stroke and order. 
Moreover, the stroke-matching technique is interesting and completely controllable. It is 
primarily due to our symbol categorisation and the use of stroke spatial information in 
template management. To handle spatial relation efficiently (rather than not just based on 
orthogonal projection i.e., MBR), more elaborative spatial relation model can be used Santosh, 
Lamiroy & Wendling (2012), for instance. In addition, use of machine learning techniques like 
inductive logic programming (ILP) Amin (2000); Santosh et al. (2009) to exploit the complete 
structural properties in terms of first order logic (FOL) description. 
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